Mental health challenges such as anxiety, depression, and stress among students in higher education are becoming a significant public health concern. A growing body of research highlights the socioeconomic and demographic factors that influence students’ mental health outcomes. For example, a study of college students in the USA found that mental health issues were associated with factors such as sex, race, ethnicity, religiosity, relationship status, living on campus, and financial situation (Eisenberg, Hunt, & Speer, 2013). Additionally, research suggests that female students are more likely to experience higher levels of stress and depression compared to their male counterparts (Hyde & Grabe, 2008) (M.Pilar Matud, 2004,). These differences are often linked to biological, psychological, and social factors, including coping mechanisms (Nolen-Hoeksema, 2012). Given these complexities, it is important to explore how gender influences students’ depression levels.
This research question is both relevant and timely, as mental health concerns among university students continue to rise. Factors such as academic pressure, social expectations, financial burdens, and job uncertainties contribute to these challenges (Beiter, Nash, R., McCrady , Rhoades, & Linscomb, 2015). By examining gender differences in depression, this study can add to the existing literature on student well-being and help shape interventions that address the specific needs of male and female students. Understanding these dynamics is particularly important for universities and policymakers in Bangladesh, as it can guide the development of gender-sensitive mental health programs, promote awareness, and establish effective support systems for students.
Research Question
How does gender influence depression level among university students?
Hypothesis
Female university students experience higher level of depression compared to male students.
The hypotheses that female university students experience higher levels of stress, anxiety, and depression compared to male students was previously tested and supported by other studies in different parts of the world. For example:
·A meta-analysis by Eisenberg et al. (2007)found that female college students were more likely to experience anxiety and depression than male students.
· A study by Bayram & Bilgel (2008) in Turkey also showed that female students had significantly higher levels of depression and anxiety compared to male students.
However, most of these studies were conducted in Western or specific non-Western contexts. The proposed hypothesis will help us understand whether this gender disparity in MHP also exists in a different socioeconomic and cultural context, such as Bangladesh.
About the data
This dataset offers insight into the Mental Health Problems (MHPs) of university students, specifically assessing stress, anxiety, and depression among students from 15 universities in Bangladesh. The dataset contains 2,028 student responses, collected from 9 public and 6 private universities.
To measure the level of mental health problems, the study employs three well-established psychological scales:
GAD-7 (Generalized Anxiety Disorder-7): Assesses levels of anxiety.
PHQ-9 (Patient Health Questionnaire-9): Evaluates symptoms of depression.
Alongside mental health assessments, the dataset includes sociodemographic variables such as age, gender, academic background, and university type (public/private). This enables a comprehensive analysis of the factors influencing students’ mental health.
The data (Syeed, et al., 2024) was collected through an online Google Forms survey, circulated via faculty representatives across the 15 universities. A team of five professors and a student psychologist ensured the adoption and validation of the three mental health scales. The survey was carefully designed to ensure internal consistency, reliability, and a sufficient sample size for meaningful analysis.
The questionnaire was divided into several sections, each focusing on different aspects of students’ mental health and academic experiences. The key variables of interest include:
Age: Categorized into ranges (e.g., Below 18, 18-22, 23-26, etc.).
Gender: Options included Male, Female, and Prefer not to say.
University: A list of universities in Bangladesh was provided for selection.
Department: Various academic departments were listed (e.g., Engineering, Business, Environmental Sciences, etc.).
Academic Year: Ranging from First Year to Fourth Year or equivalent.
Current CGPA: Ranges from Below 2.50 to 3.80 - 4.00.
Scholarship/Waiver: Whether the student received a waiver or scholarship.
Anxiety value
Anxiety label
Depression value
Depression label
Stress value
Stress label
library(ggplot2)mhp <-read.csv("data/MHP_Processed.csv")head(mhp, n =5)
Age Gender University
1 18-22 Female Independent University, Bangladesh (IUB)
2 18-22 Male Independent University, Bangladesh (IUB)
3 18-22 Male American International University Bangladesh (AIUB)
4 18-22 Male American International University Bangladesh (AIUB)
5 18-22 Male North South University (NSU)
Department Academic_Year
1 Engineering - CS / CSE / CSC / Similar to CS Second Year or Equivalent
2 Engineering - CS / CSE / CSC / Similar to CS Third Year or Equivalent
3 Engineering - CS / CSE / CSC / Similar to CS Third Year or Equivalent
4 Engineering - CS / CSE / CSC / Similar to CS Third Year or Equivalent
5 Engineering - CS / CSE / CSC / Similar to CS Second Year or Equivalent
Current_CGPA waiver_or_scholarship PSS1 PSS2 PSS3 PSS4 PSS5 PSS6 PSS7 PSS8
1 2.50 - 2.99 No 3 4 3 2 2 1 2 2
2 3.00 - 3.39 No 3 3 4 2 3 2 2 2
3 3.00 - 3.39 No 0 0 0 0 0 1 0 0
4 3.00 - 3.39 No 3 1 2 1 4 3 2 2
5 2.50 - 2.99 No 4 4 4 2 2 2 0 2
PSS9 PSS10 Stress.Value Stress.Label GAD1 GAD2 GAD3 GAD4 GAD5 GAD6
1 4 4 29 High Perceived Stress 2 2 3 2 2 2
2 2 3 24 Moderate Stress 1 2 2 1 1 3
3 0 0 15 Moderate Stress 0 0 0 0 0 0
4 3 2 17 Moderate Stress 2 1 1 1 2 1
5 4 4 32 High Perceived Stress 3 0 3 3 1 1
GAD7 Anxiety.Value Anxiety.Label PHQ1 PHQ2 PHQ3 PHQ4 PHQ5 PHQ6 PHQ7 PHQ8
1 2 15 Severe Anxiety 2 2 3 2 2 2 2 3
2 2 12 Moderate Anxiety 3 2 2 2 2 2 2 2
3 0 0 Minimal Anxiety 0 0 0 0 0 0 0 0
4 2 10 Moderate Anxiety 2 1 2 1 2 1 2 2
5 3 14 Moderate Anxiety 1 3 3 3 1 3 0 3
PHQ9 Depression.Value Depression.Label
1 2 20 Severe Depression
2 2 19 Moderately Severe Depression
3 0 0 No Depression
4 1 14 Moderate Depression
5 3 20 Severe Depression
library(car)
Loading required package: carData
library(dplyr)
Attaching package: 'dplyr'
The following object is masked from 'package:car':
recode
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Checking and cleaning the data
mhp <- mhp |>select( Gender, Academic_Year, Current_CGPA, waiver_or_scholarship, Anxiety.Value, Depression.Value, Stress.Value )mhp <- mhp |>rename(CGPA = Current_CGPA,Scholarship_Waiver = waiver_or_scholarship,Anxiety = Anxiety.Value,Depression = Depression.Value,Stress = Stress.Value )# Recoding the "Prefer not to say" value as NA in Gender# Recoding the "Other" value as NA in CGPA# Recoding the "Other" value as NA in Academic_Yearmhp <- mhp |>mutate(Gender =na_if(Gender, "Prefer not to say"),CGPA =na_if(CGPA, "Other"),Academic_Year =na_if(Academic_Year, "Other"))mhp <- mhp |>filter(!is.na(Gender) &!is.na(CGPA) &!is.na(Academic_Year))mhp <- mhp |>mutate(Academic_Year = dplyr::recode(as.character(Academic_Year),"First Year or Equivalent"="First Year","Second Year or Equivalent"="Second Year","Third Year or Equivalent"="Third Year","Fourth Year or Equivalent"="Fourth Year" ),Academic_Year =factor(Academic_Year, levels =c("First Year", "Second Year", "Third Year", "Fourth Year"), ordered =TRUE) )mhp <- mhp |>mutate(CGPA = dplyr::recode(as.character(CGPA),"Below 2.50"=2.00, "2.50 - 2.99"=2.50,"3.00 - 3.39"=3.00,"3.40 - 3.79"=3.40,"3.80 - 4.00"=3.80 ),CGPA =as.numeric(CGPA) )write.csv(mhp, "mhp_cleaned.csv", row.names =FALSE)
The data cleaning and processing steps involved several key transformations to prepare the data set for analysis. First, the data set was subset to include only relevant variables (Gender, Academic_Year, Current_CGPA, waiver_or_scholarship, and mental health scores for Anxiety, Depression, and Stress), which were then renamed for clarity. Next, missing or ambiguous responses (“Prefer not to say” in Gender and “Other” in CGPA and Academic_Year) were recoded as NA and subsequently removed to ensure data consistency. The Academic_Year values were standardized (e.g., “First Year or Equivalent” became “First Year”) and converted into an ordered factor for meaningful comparisons. Similarly,CGPA was recoded from categorical ranges (e.g., “2.50 - 2.99”) to numerical midpoints (e.g., 2.50) and converted to a numeric type for quantitative analysis. Finally, the cleaned data set was saved as a CSV file mhp_cleaned.csv. These steps ensured the data set was tidy, with consistent formatting and no extraneous or ambiguous entries, making it suitable for statistical analysis.
Analyzing the descriptive statistics
head(mhp, n =4)
Gender Academic_Year CGPA Scholarship_Waiver Anxiety Depression Stress
1 Female Second Year 2.5 No 15 20 29
2 Male Third Year 3.0 No 12 19 24
3 Male Third Year 3.0 No 0 0 15
4 Male Third Year 3.0 No 10 14 17
Gender Academic_Year CGPA Scholarship_Waiver
Length:1782 First Year :444 Min. :2.000 Length:1782
Class :character Second Year:374 1st Qu.:2.500 Class :character
Mode :character Third Year :565 Median :3.000 Mode :character
Fourth Year:399 Mean :3.076
3rd Qu.:3.400
Max. :3.800
Anxiety Depression Stress
Min. : 0.00 Min. : 0.00 Min. : 0.0
1st Qu.: 8.00 1st Qu.: 9.00 1st Qu.:19.0
Median :13.00 Median :15.00 Median :22.0
Mean :12.36 Mean :14.43 Mean :22.9
3rd Qu.:17.00 3rd Qu.:19.00 3rd Qu.:27.0
Max. :21.00 Max. :27.00 Max. :40.0
ggplot(mhp, aes(x = CGPA, y = Stress, color = Gender)) +geom_point(alpha =0.4)
The cleaned and refined dataset now includes 1,782 university students, categorized by gender (character type), academic year (First Year: 444, Second Year: 374, Third Year: 565, Fourth Year: 399), and scholarship waiver status (character type). The CGPA scores range from 2.0 to 3.8, with a median of 3.0 and a mean of 3.08, indicating a slight right skew. Mental health metrics reveal that anxiety scores range from 0 to 21, with a median of 13 and a mean of 12.36, while depression scores range from 0 to 27, with a median of 15 and a mean of 14.43. Stress scores are higher, ranging from 0 to 40, with a median of 22 and a mean of 22.9. The distributions for anxiety and depression are roughly symmetric, whereas stress shows a wider spread, suggesting greater variability in student stress levels. These statistics provide a baseline for further analysis of mental health trends across demographics.
# Gender vs Anxiety.Value plotggplot(mhp, aes(x = Gender, y = Anxiety, fill = Gender)) +geom_boxplot() +labs(title ="Box Plot of Anxiety Value by Gender",x ="Gender",y ="Anxiety Value" )
# Gender vs Depression.Value boxplotggplot(mhp, aes(x = Gender, y = Depression, fill = Gender)) +geom_boxplot(width =0.5) +labs(title ="Whisker Plot of Depression Value by Gender",x ="Gender",y ="Depression Value" )
# Gender vs Stress.Value box plotggplot(mhp, aes(x = Gender, y = Stress, fill = Gender)) +geom_boxplot(width =0.5) +labs(title ="Whisker Plot of Stress Value by Gender",x ="Gender",y ="Stress Value" )
# Anxiety.Value vs Depression.Value scatter plotggplot(mhp, aes(x = Anxiety, y = Depression)) +geom_point(alpha =0.2, color ="blue") +geom_smooth(method ="lm", color ="lightblue", se =FALSE) +labs(title ="Anxiety vs. Depression Scatter Plot",x ="Anxiety Value",y ="Depression Value" )
`geom_smooth()` using formula = 'y ~ x'
# Anxiety.Value vs Stress.Value scatter plotggplot(mhp, aes(x = Anxiety, y = Stress)) +geom_point(alpha =0.4, color ="orange") +geom_smooth(method ="lm", color ="lightblue", se =FALSE) +labs(title ="Anxiety vs. Stress Scatter Plot",x ="Anxiety Value",y ="Stress Value" )
`geom_smooth()` using formula = 'y ~ x'
# Depression.Value vs Stress.Value Scatter Plotggplot(mhp, aes(x = Depression, y = Stress)) +geom_point(alpha =0.4, color ="lightgreen") +geom_smooth(method ="lm", color ="lightblue", se =FALSE) +labs(title ="Depression vs. Stress Scatter Plot",x ="Depression Value",y ="Stress Value" )
`geom_smooth()` using formula = 'y ~ x'
Hypothesis testing
To test the hypothesis that female university students experience higher levels of depression compared to male students, we will perform an independent samples t-test. This test compares the mean depression scores between two independent groups (male and female students).
Null Hypothesis (H₀): There is no difference in depression levels between female and male students (μ_female = μ_male).
Alternative Hypothesis (H₁): Female students have higher depression levels than male students (μ_female > μ_male)
Gender Academic_Year CGPA Scholarship_Waiver Anxiety Depression Stress
1 Female Second Year 2.5 No 15 20 29
2 Male Third Year 3.0 No 12 19 24
3 Male Third Year 3.0 No 0 0 15
4 Male Third Year 3.0 No 10 14 17
5 Male Second Year 2.5 No 14 20 32
6 Male First Year 3.0 No 5 3 18
ggplot(mhp, aes(x = Depression, fill = Gender)) +geom_histogram(position ="identity", alpha =0.6, bins =15) +labs(title ="Histogram of Depression Scores by Gender",x ="Depression Score",y ="Count") +theme_minimal()
Testing equality of variance
Since the two samples are somewhat normally distributed. we conduct a F-test to determine whether the sample variances are equal or not.
# Conducting F-test to determine whether the sample variances are equal or notvar.test(Depression ~ Gender, data = mhp)
F test to compare two variances
data: Depression by Gender
F = 0.9773, num df = 543, denom df = 1237, p-value = 0.76
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.8489462 1.1295921
sample estimates:
ratio of variances
0.9772958
The results of the F-test (p-value = 0.76) confirms that the variances of the two samples are not equal. We, therefore, will be using a Welch’s t-test instead of Student’s t-test.
# Conducting the t-testt_test_gender <-t.test(Depression ~ Gender, data = mhp, alternative ="greater")print(t_test_gender)
Welch Two Sample t-test
data: Depression by Gender
t = 4.6603, df = 1048.1, p-value = 1.782e-06
alternative hypothesis: true difference in means between group Female and group Male is greater than 0
95 percent confidence interval:
1.024034 Inf
sample estimates:
mean in group Female mean in group Male
15.53493 13.95153
Interpreting the results of the t-test
t = 4.66 and p = 1.782e-06 indicates a very strong evidence that females have higher depression scores. 95% Confidence Interval ranges From approximately 1.02 to infinity, supporting our alternative hypothesis. Sample Means for Female and male are 15.53 and 13.95 respectively.
Since the p-value is extremely small (much less than 0.05), we can reject the null hypothesis in favor of the alternative hypothesis. The results of the hypothesis test provide strong evidence that female students experience significantly higher levels of depression compared to male counterparts. As a result our hypothesis is supported by the sample data.
Model Comparison
Key Variables
Response Variable (Dependent Variable): The response variable for our model analysis is Depression. It’s a numeric variable that represents the level of depression reported by each student.
Explanatory Variable (Main Independent Variable): The main independent variable of our model is Gender, which is hypothesized to affect depression levels. In addition we will include the following control variables in our model analysis.
Academic_Year (ordinal categorical): We expect this variable to affect depression due to academic stress increasing with time.
CGPA (numeric): Studies on student mental health suggest that academic performance may be associated with mental health.
Scholarship_Waiver (binary categorical): Financial hardship is widely viewed as one of the factors affecting student mental health. This variable ay reflect financial stress of the students.
Anxiety and Stress (numeric): Psychological predictors that often co-occur with depression.
Variable Interaction
We consider an interaction term between Gender and Academic_Year in a later model to explore whether the effect of gender on depression varies by year.
No variable transformation was required at this stage since all variables were in appropriate formats (numeric or categorical). CGPA has already been converted to numeric scale from categories.
Analyzing Regression Models and Comparisons
Model 1 (Baseline Model)
This is a simple model that we run to test the core hypothesis without confounders. The basic linear regression model shows a statistically significant relationship between gender and depression scores among university students. Female students (the reference group) have an average depression score of 15.53, while male students score on average 1.58 points lower. This difference is significant (p < 0.001), indicating that female students report higher levels of depression than male students. Although the overall variance explained by gender is modest (R² = 0.012), the gender effect is consistent and meaningful in the context of mental health outcomes.
model1 <-lm(Depression ~ Gender, data = mhp)summary(model1)
Call:
lm(formula = Depression ~ Gender, data = mhp)
Residuals:
Min 1Q Median 3Q Max
-15.5349 -4.9515 0.0485 5.0485 13.0485
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.5349 0.2845 54.611 < 2e-16 ***
GenderMale -1.5834 0.3413 -4.639 3.75e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.635 on 1780 degrees of freedom
Multiple R-squared: 0.01195, Adjusted R-squared: 0.01139
F-statistic: 21.52 on 1 and 1780 DF, p-value: 3.748e-06
summary(model1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.534926 0.2844644 54.611152 0.000000e+00
GenderMale -1.583392 0.3412883 -4.639455 3.748303e-06
Model 2 (Theoretical Controls Model)
The second model builds on the base model and adds all the theoretically justified control variables. These model controls for academic level, academic performance, financial aid, and mental health co-morbidities. In Model 2, the predictors explain a substantial portion of the variance in depression scores (Adjusted R² = 0.6095). However, after adjusting for academic year, CGPA, scholarship status, anxiety, and stress, gender is no longer a significant predictor of depression (p = 0.885), suggesting the initial gender difference observed in the basic model may be explained by these other factors. Notably, anxiety and stress are strong, highly significant predictors of depression (p < 0.001), and receiving a scholarship or waiver is also positively associated with higher depression scores (p = 0.009). Other control variables, including academic year and CGPA, do not show statistically significant effects.
# Analyzing the second modelmodel2 <-lm(Depression ~ Gender + Academic_Year + CGPA + Scholarship_Waiver + Anxiety + Stress, data = mhp)summary(model2)
This model uses step-wise AIC-based selection to identify the most parsimonious model. We began the exercise with a full model including all theoretically relevant predictors. The step-wise procedure progressively removed variables that did not contribute meaningfully to model fit, ultimately resulting in a more parsimonious model with only three predictors: Scholarship_Waiver, Anxiety, and Stress. These variables demonstrated the strongest association with depression levels and contributed most to minimizing the AIC. Variables such as Gender, Academic_Year, and CGPA were excluded, suggesting that they did not add significant predictive value beyond what was explained by anxiety, stress, and scholarship status. This confirms that psychological and financial factors are the key drivers of depression in this data set.
model2 <-lm(Depression ~ Gender + Academic_Year + CGPA + Scholarship_Waiver + Anxiety + Stress, data = mhp)step_model <-step(model2, direction ="both")
Call:
lm(formula = Depression ~ Scholarship_Waiver + Anxiety + Stress,
data = mhp)
Residuals:
Min 1Q Median 3Q Max
-23.4132 -2.8221 0.0038 2.7416 22.7285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.62413 0.35853 1.741 0.0819 .
Scholarship_WaiverYes 0.62874 0.23902 2.631 0.0086 **
Anxiety 0.82467 0.02344 35.180 < 2e-16 ***
Stress 0.15197 0.01927 7.887 5.35e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.166 on 1778 degrees of freedom
Multiple R-squared: 0.6109, Adjusted R-squared: 0.6102
F-statistic: 930.5 on 3 and 1778 DF, p-value: < 2.2e-16
Model 4 (Interaction Model)
This model tests whether the effect of gender on depression changes across academic years. The results show that neither gender, academic year, nor their interaction terms are statistically significant predictors of depression. This indicates that the effect of gender on depression does not significantly change across academic years. However, anxiety and stress remain strong, highly significant predictors of depression (p < 0.001), and receiving a scholarship or waiver is also associated with higher depression levels (p = 0.009). The model explains a substantial portion of the variance (Adjusted R² = 0.6093) and performs nearly identically to Model 2 and the automated selection model in terms of explanatory power. This suggests that adding interaction terms does not improve the model, and that the core drivers of depression remain anxiety, stress, and financial aid status, rather than gender or academic year.
# Testing a model with potential variable interactionmodel4 <-lm(Depression ~ Gender * Academic_Year + CGPA + Scholarship_Waiver + Anxiety + Stress, data = mhp)summary(model4)
We check the multicollinearity of the two main predictors - Stress and Anxiety. The variance inflation factor (VIF) test indicates that the predictors have a VIF values of ~1.68, which are well below 5, so there’s no serious multicollinearity between Stress and Anxiety. Although Stress and Anxiety are conceptually related (both are psychological variables), they do not overlap enough to distort our proposed regression model. We can, therefore, include both predictors in the model.
model5 <-lm(Depression ~ Stress + Anxiety, data = mhp)vif(model5)
Stress Anxiety
1.680506 1.680506
Model Comparison and Interpretation
Analyzing the model performance
Model
Description
Gender Effect
Statistical Significance
Interpretation
Model 1
Depression ~ Gender
-1.58
Significant (p < 0.001)
Female students report significantly higher depression
Model 2
Gender + theoretical controls
-.03
Not significant (p = 0.885)
Gender effect disappears after adjusting for Anxiety, Stress, etc.
Model 3
Automated variable selection (no Gender)
Not included
N/A
Gender excluded—didn’t improve model fit
Model 4
Interaction: Gender * Academic_Year
Not significant
Not significant
No evidence that gender’s effect varies across academic years
Model fit comparison
Model
Adjusted R-squared
AIC
Interpretation
Model 1
0.011
~5098
Low explanatory power
Model 2
0.6095
~5098
Added predictors improve explanatory power
Model 3
0.6096
~5089
Best AIC and more parsimonious than Model 2
Model 4
0.6093
~5100
Adding interaction makes no improvement
Summary of model comparison
Model comparisons reveal that while Gender is a significant predictor of Depression in the unadjusted Model 1, its effect disappears after adding control variables in subsequent models. Models 2 through 4 all show strong model fit with Adjusted R² around 0.61, but Model 3, selected through automated variable selection, has the lowest AIC, indicating the best balance between fit and simplicity. This model retains only Anxiety, Stress, and Scholarship_Waiver as significant predictors. The inclusion of interaction terms in Model 4 does not enhance the model’s explanatory power. Overall, the analysis suggests that depression among students is primarily driven by psychological and financial factors, rather than demographic characteristics like gender or academic year.
Final model selection
The final regression model, which includes Gender along with key predictors selected through automated variable selection, demonstrates strong overall fit (Adjusted R² = 0.610, AIC = 10150.66). While Gender remains statistically non-significant (p = 0.89), it is retained due to its theoretical importance to the research hypothesis. The most significant predictors of depression are Anxiety (p < 0.001) and Stress (p < 0.001), both showing strong positive associations with depression levels. Additionally, receiving a Scholarship/Waiver is associated with significantly higher depression scores (p = 0.009), suggesting a potential link to financial or academic stress. This final model strikes a strong balance between theoretical relevance and statistical efficiency.
# Constructing the final modelfinal_model <-lm(Depression ~ Gender + Scholarship_Waiver + Anxiety + Stress, data = mhp)summary(final_model)
Call:
lm(formula = Depression ~ Gender + Scholarship_Waiver + Anxiety +
Stress, data = mhp)
Residuals:
Min 1Q Median 3Q Max
-23.4002 -2.8129 0.0033 2.7406 22.7363
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.65295 0.41489 1.574 0.11571
GenderMale -0.03007 0.21767 -0.138 0.89013
Scholarship_WaiverYes 0.62614 0.23983 2.611 0.00911 **
Anxiety 0.82458 0.02346 35.153 < 2e-16 ***
Stress 0.15170 0.01937 7.830 8.33e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.167 on 1777 degrees of freedom
Multiple R-squared: 0.6109, Adjusted R-squared: 0.61
F-statistic: 697.5 on 4 and 1777 DF, p-value: < 2.2e-16
# Checking model fit: AIC and adjusted R-squaredAIC(final_model)
[1] 10150.66
Regression diagnostics for the final model
plot(final_model)
Residuals vs fitted plot
The residuals are fairly evenly scattered round the horizontal line, especially in the lower and higher range of fitted values. However, there is slight curvature (wider spread at middle range of fitted values), indicating mild heteroscedasticity- the variance of residuals may not be constant across all levels of predicted Depression. In general, the plot looks acceptable and doesn’t show severe violations of assumption.
Q-Q plot
Most points lie very close to the line, especially in the center — this suggests that the residuals are approximately normally distributed, which is a good sign. However, there are points at far left and far right that slightly deviate from the line (for example, observations 64, 992, and 1013). These outliers indicate heavier tails than expected under normality.
Scale-Location plot
The trend line appears fairly flat, confirming that the spread of points is relatively uniform across the range of fitted values. This trend also mean that there is no clear funnel shape (which would suggest heteroscedasticity). This plot supports the assumption of homoscedasticity— the variance of residuals is fairly constant across the fitted values.
Residuals vs Leverage plot
The reported Cook’s distance values (0.002–0.008) are all below typical thresholds (0.5), suggesting no extremely influential points in this plot. The absence of points far beyond Cook’s distance lines suggests the model is not heavily influenced by outliers. To summarize, the plot suggests a relatively stable model with no extreme influential points, but a few high-leverage observations may need review.